153-2008: SAS/OR®: Rigorous Constrained Optimized Binning for Credit Scoring
نویسندگان
چکیده
Credit scoring can be defined as a statistical modeling technique used to assign risk to credit applicants or to existing credit accounts. We present a new process that enhances the formulation and solution approach in the SAS® system during the so-called “binning” phase by exploiting SAS/OR optimization capabilities to approach the problem from a mathematically rigorous perspective. Usually, attributes such as age, income, and so on, are segmented into grouping intervals, with the aim of creating bins for scorecards that maximize correlation with these attributes. A critical aspect of the binning process is the enforcement of various constraints, such as minimum/maximum number of bins, minimum/maximum bin widths, and maximum number of observations per bin. These requirements significantly complicate the binning process. Our approach is rigorous in the sense that global linear constraints are implemented exactly with the use of mixed-integer linear programming. Furthermore, the methodology can be extended to a fully rigorous approach (which incorporates an additional nonlinear constraint of imposed monotonicity, at the cost of computational time) within the same mathematical programming context by the addition of variables and constraints related to “weight of evidence” (WOE). The key enabling factor is the mathematical programming model structure and careful implementation of constraints. We present some results of the binning process proposed here and related scorecards, and we compare the solutions against the current state-of-the-art approach. The methodology presented here will be available through SAS® Enterprise MinerTM for Credit Scoring in a near-future release.
منابع مشابه
A Necessary Condition for a Good Binning Algorithm in Credit Scoring
Binning is a categorization process to transform a continuous variable into a small set of groups or bins. Binning is widely used in credit scoring. In particular, it can be used to define the Weight of Evidence (WOE) transformation. In this paper, we first derive an explicit solution to a logistic regression model with one independent variable that has undergone a WOE transformation. We then u...
متن کاملPaper 1323-2017: Real AdaBoost: Boosting for Credit Scorecards and Similarity to WOE Logistic Regression
Adaboost is a machine learning algorithm that builds a series of small decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model. We will discuss the AdaBoost methodology and introduce the extension called Real AdaBoost. Real AdaBoost comes from a strong academic pedigree: its authors are pioneers of machine learning a...
متن کامل